Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 211
Filtrar
1.
J Med Chem ; 67(8): 6508-6518, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38568752

RESUMO

Computational models that predict pharmacokinetic properties are critical to deprioritize drug candidates that emerge as hits in high-throughput screening campaigns. We collected, curated, and integrated a database of compounds tested in 12 major end points comprising over 10,000 unique molecules. We then employed these data to build and validate binary quantitative structure-activity relationship (QSAR) models. All trained models achieved a correct classification rate above 0.60 and a positive predictive value above 0.50. To illustrate their utility in drug discovery, we used these models to predict the pharmacokinetic properties for drugs in the NCATS Inxight Drugs database. In addition, we employed the developed models to predict the pharmacokinetic properties of all compounds in the DrugBank. All models described in this paper have been integrated and made publicly available via the PhaKinPro Web-portal that can be accessed at https://phakinpro.mml.unc.edu/.


Assuntos
Relação Quantitativa Estrutura-Atividade , Humanos , Internet , Descoberta de Drogas , Preparações Farmacêuticas/metabolismo , Preparações Farmacêuticas/química
2.
ArXiv ; 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38560736

RESUMO

Structure-based virtual screening (SBVS) is a key workflow in computational drug discovery. SBVS models are assessed by measuring the enrichment of known active molecules over decoys in retrospective screens. However, the standard formula for enrichment cannot estimate model performance on very large libraries. Additionally, current screening benchmarks cannot easily be used with machine learning (ML) models due to data leakage. We propose an improved formula for calculating VS enrichment and introduce the BayesBind benchmarking set composed of protein targets that are structurally dissimilar to those in the BigBind training set. We assess current models on this benchmark and find that none perform appreciably better than a KNN baseline. We publicly release the BayesBind benchmark at https://github.com/molecularmodelinglab/bigbind.

3.
Adv Inf Retr ; 14609: 34-49, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38585224

RESUMO

Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding-SmallSA-for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.

4.
J Am Chem Soc ; 146(12): 8016-8030, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38470819

RESUMO

There have been significant advances in the flexibility and power of in vitro cell-free translation systems. The increasing ability to incorporate noncanonical amino acids and complement translation with recombinant enzymes has enabled cell-free production of peptide-based natural products (NPs) and NP-like molecules. We anticipate that many more such compounds and analogs might be accessed in this way. To assess the peptide NP space that is directly accessible to current cell-free technologies, we developed a peptide parsing algorithm that breaks down peptide NPs into building blocks based on ribosomal translation logic. Using the resultant data set, we broadly analyze the biophysical properties of these privileged compounds and perform a retrobiosynthetic analysis to predict which peptide NPs could be directly synthesized in augmented cell-free translation reactions. We then tested these predictions by preparing a library of highly modified peptide NPs. Two macrocyclases, PatG and PCY1, were used to effect the head-to-tail macrocyclization of candidate NPs. This retrobiosynthetic analysis identified a collection of high-priority building blocks that are enriched throughout peptide NPs, yet they had not previously been tested in cell-free translation. To expand the cell-free toolbox into this space, we established, optimized, and characterized the flexizyme-enabled ribosomal incorporation of piperazic acids. Overall, these results demonstrate the feasibility of cell-free translation for peptide NP total synthesis while expanding the limits of the technology. This work provides a novel computational tool for exploration of peptide NP chemical space, that could be expanded in the future to allow design of ribosomal biosynthetic pathways for NPs and NP-like molecules.


Assuntos
Produtos Biológicos , Produtos Biológicos/química , Quimioinformática , Peptídeos/química , Biossíntese Peptídica , Aminoácidos
5.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38175789

RESUMO

SUMMARY: Knowledge graphs are being increasingly used in biomedical research to link large amounts of heterogenous data and facilitate reasoning across diverse knowledge sources. Wider adoption and exploration of knowledge graphs in the biomedical research community is limited by requirements to understand the underlying graph structure in terms of entity types and relationships, represented as nodes and edges, respectively, and learn specialized query languages for graph mining and exploration. We have developed a user-friendly interface dubbed ExEmPLAR (Extracting, Exploring, and Embedding Pathways Leading to Actionable Research) to aid reasoning over biomedical knowledge graphs and assist with data-driven research and hypothesis generation. We explain the key functionalities of ExEmPLAR and demonstrate its use with a case study considering the relationship of Trypanosoma cruzi, the etiological agent of Chagas disease, to frequently associated cardiovascular conditions. AVAILABILITY AND IMPLEMENTATION: ExEmPLAR is freely accessible at https://www.exemplar.mml.unc.edu/. For code and instructions for the using the application, see: https://github.com/beasleyjonm/AOP-COP-Path-Extractor.


Assuntos
Pesquisa Biomédica , Reconhecimento Automatizado de Padrão
6.
Mol Inform ; 43(1): e202300207, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37802967

RESUMO

Recent rapid expansion of make-on-demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure-based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM (HIt Discovery using Docking ENriched by GEnerative Modeling) that greatly accelerates virtual screening. This workflow uniquely integrates machine learning, generative chemistry, massive chemical similarity searching and molecular docking of small, selected libraries in the beginning and the end of the workflow. For each target, HIDDEN GEM nominates a small number of top-scoring virtual hits prioritized from ultra-large chemical libraries. We have benchmarked HIDDEN GEM by conducting virtual screening campaigns for 16 diverse protein targets using Enamine REAL Space library comprising 37 billion molecules. We show that HIDDEN GEM yields the highest enrichment factors as compared to state of the art accelerated virtual screening methods, while requiring the least computational resources. HIDDEN GEM can be executed with any docking software and employed by users with limited computational resources.


Assuntos
Bibliotecas de Moléculas Pequenas , Software , Bibliotecas de Moléculas Pequenas/química , Simulação de Acoplamento Molecular , Fluxo de Trabalho
7.
Nat Rev Drug Discov ; 23(2): 141-155, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38066301

RESUMO

Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.


Assuntos
Aprendizado Profundo , Relação Quantitativa Estrutura-Atividade , Humanos , Inteligência Artificial , Metodologias Computacionais , Teoria Quântica , Descoberta de Drogas/métodos , Desenho de Fármacos
8.
J Chem Inf Model ; 64(7): 2488-2495, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38113513

RESUMO

Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 µM cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.


Assuntos
Proteínas , Ubiquitina-Proteína Ligases , Simulação de Acoplamento Molecular , Ligantes , Estudos Prospectivos , Proteínas/química , Ligação Proteica , Ubiquitina-Proteína Ligases/metabolismo
9.
J Alzheimers Dis ; 96(2): 499-505, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37807778

RESUMO

Vaccine repurposing that considers individual genotype may aid personalized prevention of Alzheimer's disease (AD). In this retrospective cohort study, we used Cardiovascular Health Study data to estimate associations of pneumococcal polysaccharide vaccine and flu shots received between ages 65-75 with AD onset at age 75 or older, taking into account rs6859 polymorphism in NECTIN2 gene (AD risk factor). Pneumococcal vaccine, and total count of vaccinations against pneumonia and flu, were associated with lower odds of AD in carriers of rs6859 A allele, but not in non-carriers. We conclude that pneumococcal polysaccharide vaccine is a promising candidate for genotype-tailored AD prevention.


Assuntos
Doença de Alzheimer , Pneumonia Pneumocócica , Humanos , Idoso , Pneumonia Pneumocócica/prevenção & controle , Estudos Retrospectivos , Doença de Alzheimer/genética , Doença de Alzheimer/prevenção & controle , Vacinação , Vacinas Pneumocócicas/uso terapêutico , Genótipo
10.
J Cheminform ; 15(1): 82, 2023 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-37726809

RESUMO

We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24-25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and academics from seventy-nine countries registered for the meeting. As part of the meeting, advances in enumeration and visualization of chemical space, applications in natural product-based drug discovery, drug discovery for neglected diseases, toxicity prediction, and general guidelines for data analysis were discussed. Experts from ChEMBL presented a workshop on how to use the resources of this major compounds database used in cheminformatics. The school also included a round table with editors of cheminformatics journals. The full program of the meeting and the recordings of the sessions are publicly available at https://www.youtube.com/@SchoolChemInfLA/featured .

11.
Proteins ; 91(12): 1822-1828, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37697630

RESUMO

In the ligand prediction category of CASP15, the challenge was to predict the positions and conformations of small molecules binding to proteins that were provided as amino acid sequences or as models generated by the AlphaFold2 program. For most targets, we used our template-based ligand docking program ClusPro ligTBM, also implemented as a public server available at https://ligtbm.cluspro.org/. Since many targets had multiple chains and a number of ligands, several templates, and some manual interventions were required. In a few cases, no templates were found, and we had to use direct docking using the Glide program. Nevertheless, ligTBM was shown to be a very useful tool, and by any ranking criteria, our group was ranked among the top five best-performing teams. In fact, all the best groups used template-based docking methods. Thus, it appears that the AlphaFold2-generated models, despite the high accuracy of the predicted backbone, have local differences from the x-ray structure that make the use of direct docking methods more challenging. The results of CASP15 confirm that this limitation can be frequently overcome by homology-based docking.


Assuntos
Proteínas , Software , Conformação Proteica , Simulação de Acoplamento Molecular , Ligantes , Proteínas/química , Ligação Proteica , Sítios de Ligação
12.
J Med Chem ; 66(18): 12828-12839, 2023 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-37677128

RESUMO

Hits from high-throughput screening (HTS) of chemical libraries are often false positives due to their interference with assay detection technology. In response, we generated the largest publicly available library of chemical liabilities and developed "Liability Predictor," a free web tool to predict HTS artifacts. More specifically, we generated, curated, and integrated HTS data sets for thiol reactivity, redox activity, and luciferase (firefly and nano) activity and developed and validated quantitative structure-interference relationship (QSIR) models to predict these nuisance behaviors. The resulting models showed 58-78% external balanced accuracy for 256 external compounds per assay. QSIR models developed and validated herein identify nuisance compounds among experimental hits more reliably than do popular PAINS filters. Both the models and the curated data sets were implemented in "Liability Predictor," publicly available at https://liability.mml.unc.edu/. "Liability Predictor" may be used as part of chemical library design or for triaging HTS hits.


Assuntos
Artefatos , Ensaios de Triagem em Larga Escala , Ensaios de Triagem em Larga Escala/métodos , Bibliotecas de Moléculas Pequenas/química
13.
NPJ Vaccines ; 8(1): 129, 2023 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-37658087

RESUMO

COVID-19 vaccines have been instrumental tools in the fight against SARS-CoV-2 helping to reduce disease severity and mortality. At the same time, just like any other therapeutic, COVID-19 vaccines were associated with adverse events. Women have reported menstrual cycle irregularity after receiving COVID-19 vaccines, and this led to renewed fears concerning COVID-19 vaccines and their effects on fertility. Herein we devised an informatics workflow to explore the causal drivers of menstrual cycle irregularity in response to vaccination with mRNA COVID-19 vaccine BNT162b2. Our methods relied on gene expression analysis in response to vaccination, followed by network biology analysis to derive testable hypotheses regarding the causal links between BNT162b2 and menstrual cycle irregularity. Five high-confidence transcription factors were identified as causal drivers of BNT162b2-induced menstrual irregularity, namely: IRF1, STAT1, RelA (p65 NF-kB subunit), STAT2 and IRF3. Furthermore, some biomarkers of menstrual irregularity, including TNF, IL6R, IL6ST, LIF, BIRC3, FGF2, ARHGDIB, RPS3, RHOU, MIF, were identified as topological genes and predicted as causal drivers of menstrual irregularity. Our network-based mechanism reconstruction results indicated that BNT162b2 exerted biological effects similar to those resulting from prolactin signaling. However, these effects were short-lived and didn't raise concerns about long-term infertility issues. This approach can be applied to interrogate the functional links between drugs/vaccines and other side effects.

14.
FEMS Microbiol Rev ; 47(5)2023 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-37596064

RESUMO

Understanding the origins of past and present viral epidemics is critical in preparing for future outbreaks. Many viruses, including SARS-CoV-2, have led to significant consequences not only due to their virulence, but also because we were unprepared for their emergence. We need to learn from large amounts of data accumulated from well-studied, past pandemics and employ modern informatics and therapeutic development technologies to forecast future pandemics and help minimize their potential impacts. While acknowledging the complexity and difficulties associated with establishing reliable outbreak predictions, herein we provide a perspective on the regions of the world that are most likely to be impacted by future outbreaks. We specifically focus on viruses with epidemic potential, namely SARS-CoV-2, MERS-CoV, DENV, ZIKV, MAYV, LASV, noroviruses, influenza, Nipah virus, hantaviruses, Oropouche virus, MARV, and Ebola virus, which all require attention from both the public and scientific community to avoid societal catastrophes like COVID-19. Based on our literature review, data analysis, and outbreak simulations, we posit that these future viral epidemics are unavoidable, but that their societal impacts can be minimized by strategic investment into basic virology research, epidemiological studies of neglected viral diseases, and antiviral drug discovery.


Assuntos
COVID-19 , Infecção por Zika virus , Zika virus , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Surtos de Doenças
15.
ArXiv ; 2023 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-37547658

RESUMO

Molecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose. We combine these two approaches by training a pose scoring function in a diffusion-inspired manner. In our method, PLANTAIN, a neural network is used to develop a very fast pose scoring function. We parameterize a simple scoring function on the fly and use L-BFGS minimization to optimize an initially random ligand pose. Using rigorous benchmarking practices, we demonstrate that our method achieves state-of-the-art performance while running ten times faster than the next-best method. We release PLANTAIN publicly and hope that it improves the utility of virtual screening workflows.

16.
Antiviral Res ; 217: 105620, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37169224

RESUMO

Diseases caused by new viruses cost thousands if not millions of human lives and trillions of dollars. We have identified, collected, curated, and integrated all chemogenomics data from ChEMBL for 13 emerging viruses that hold the greatest potential threat to global human health. By identifying and solving several challenges related to data annotation accuracy, we developed a highly curated and thoroughly annotated database of compounds tested in both phenotypic and target-based assays for these viruses that we dubbed SMACC (Small Molecule Antiviral Compound Collection). The pilot version of the SMACC database contains over 32,500 entries for 13 viruses. By analyzing data in SMACC, we have identified ∼50 compounds with polyviral inhibition profile, mostly covering flavi- and coronaviruses. The SMACC database may serve as a reference for virologists and medicinal chemists working on the development of novel BSA agents in preparation for future viral outbreaks. SMACC is publicly available at https://smacc.mml.unc.edu.


Assuntos
Infecções por Coronavirus , Vírus , Humanos , Antivirais/farmacologia , Vírus/genética , Bases de Dados Factuais
17.
J Control Release ; 353: 903-914, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36402234

RESUMO

Active learning (AL) has become a subject of active recent research both in industry and academia as an efficient approach for rapid design and discovery of novel chemicals, materials, and polymers. Herein, we have assessed the applicability of AL for the discovery of polymeric micelle formulations for poorly soluble drugs. We were motivated by the key advantages of this approach making it a desirable strategy for rational design of drug delivery systems due toto its ability to (i) employ relatively small datasets for model development, (ii) iterate between model development and model assessment using small external datasets that can be either generated in focused experimental studies or formed from subsets of the initial training data, and (iii) progressively evolve models towards increasingly more reliable predictions and the identification of novel chemicals with the desired properties. In this study, we compared various AL protocols for their effectiveness in finding biologically active molecules using synthetic datasets. We have investigated the dependency of AL performance on the size of the initial training set, the relative complexity of the task, and the choice of the initial training dataset. We found that AL techniques as applied to regression modeling offer no benefits over random search, while AL used for classification tasks performs better than models built for randomly selected training sets but still quite far from perfect. Using the best performing AL protocol,. Finally, the best performing AL approach was employed to discover and experimentally validate novel binding polymers for a case study of asialoglycoprotein receptor (ASGPR).


Assuntos
Polímeros , Aprendizagem Baseada em Problemas , Polímeros/química , Micelas , Sistemas de Liberação de Medicamentos , Peptídeos
18.
Regul Toxicol Pharmacol ; 136: 105277, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36288772

RESUMO

Exogenous metal particles and ions from implant devices are known to cause severe toxic events with symptoms ranging from adverse local tissue reactions to systemic toxicities, potentially leading to the development of cancers, heart conditions, and neurological disorders. Toxicity mechanisms, also known as Adverse Outcome Pathways (AOPs), that explain these metal-induced toxicities are severely understudied. Therefore, we deployed in silico structure- and knowledge-based approaches to identify proteome-level perturbations caused by metals and pathways that link these events to human diseases. We captured 177 structure-based, 347 knowledge-based, and 402 imputed metal-gene/protein relationships for chromium, cobalt, molybdenum, nickel, and titanium. We prioritized 72 proteins hypothesized to directly contact implant surfaces and contribute to adverse outcomes. Results of this exploratory analysis were formalized as structured AOPs. We considered three case studies reflecting the following possible situations: (i) the metal-protein-disease relationship was previously known; (ii) the metal-protein, protein-disease, and metal-disease relationships were individually known but were not linked (as a unified AOP); and (iii) one of three relationships was unknown and was imputed by our methods. These situations were illustrated by case studies on nickel-induced allergy/hypersensitivity, cobalt-induced heart failure, and titanium-induced periprosthetic osteolysis, respectively. All workflows, data, and results are freely available in https://github.com/DnlRKorn/Knowledge_Based_Metallomics/. An interactive view of select data is available at the ROBOKOP Neo4j Browser at http://robokopkg.renci.org/browser/.


Assuntos
Rotas de Resultados Adversos , Níquel , Humanos , Níquel/efeitos adversos , Titânio/toxicidade , Metais/toxicidade , Cobalto , Cromo
19.
Toxicol Sci ; 189(2): 250-259, 2022 09 24.
Artigo em Inglês | MEDLINE | ID: mdl-35916740

RESUMO

In the United States, a pre-market regulatory submission for any medical device that comes into contact with either a patient or the clinical practitioner must include an adequate toxicity evaluation of chemical substances that can be released from the device during its intended use. These substances, also referred to as extractables and leachables, must be evaluated for their potential to induce sensitization/allergenicity, which traditionally has been done in animal assays such as the guinea pig maximization test (GPMT). However, advances in basic and applied science are continuously presenting opportunities to employ new approach methodologies, including computational methods which, when qualified, could replace animal testing methods to support regulatory submissions. Herein, we developed a new computational tool for rapid and accurate prediction of the GPMT outcome that we have named PreS/MD (predictor of sensitization for medical devices). To enable model development, we (1) collected, curated, and integrated the largest publicly available dataset for GPMT results; (2) succeeded in developing externally predictive (balanced accuracy of 70%-74% as evaluated by both 5-fold external cross-validation and testing of novel compounds) quantitative structure-activity relationships (QSAR) models for GPMT using machine learning algorithms, including deep learning; and (3) developed a publicly accessible web portal integrating PreS/MD models that can predict GPMT outcomes for any molecule of interest. We expect that PreS/MD will be used by both industry and regulatory scientists in medical device safety assessments and help replace, reduce, or refine the use of animals in toxicity testing. PreS/MD is freely available at https://presmd.mml.unc.edu/.


Assuntos
Alérgenos , Testes de Toxicidade , Algoritmos , Animais , Cobaias , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Testes de Toxicidade/métodos
20.
bioRxiv ; 2022 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-35860225

RESUMO

Diseases caused by new viruses costs thousands if not millions of human lives and trillions of dollars in damage to the global economy. Despite the rapid development of vaccines for SARS-CoV-2, the lack of small molecule antiviral drugs that work against multiple viral families (broad-spectrum antivirals; BSAs) has left the entire world’s human population vulnerable to the infection between the beginning of the outbreak and the widespread availability of vaccines. Developing BSAs is an attractive, yet challenging, approach that could prevent the next, inevitable, viral outbreak from becoming a global catastrophe. To explore whether historical medicinal chemistry efforts suggest the possibility of discovering novel BSAs, we (i) identified, collected, curated, and integrated all chemical bioactivity data available in ChEMBL for molecules tested in respective assays for 13 emerging viruses that, based on published literature, hold the greatest potential threat to global human health; (ii) identified and solved the challenges related to data annotation accuracy including assay description ambiguity, missing cell or target information, and incorrect BioAssay Ontology (BAO) annotations; (iii) developed a highly curated and thoroughly annotated database of compounds tested in both phenotypic (21,392 entries) and target-based (11,123 entries) assays for these viruses; and (iv) identified a subset of compounds showing BSA activity. For the latter task, we eliminated inconclusive and annotated duplicative entries by checking the concordance between multiple assay results and identified eight compounds active against 3-4 viruses from the phenotypic data, 16 compounds active against two viruses from the target-based data, and 35 compounds active in at least one phenotypic and one target-based assay. The pilot version of our SMACC (Small Molecule Antiviral Compound Collection) database contains over 32,500 entries for 13 viruses. Our analysis indicates that previous research yielded very small number of BSA compounds. We posit that focused and coordinated efforts strategically targeting the discovery of such agents must be established and maintained going forward. The SMACC database publicly available at https://smacc.mml.unc.edu may serve as a reference for virologists and medicinal chemists working on the development of novel BSA agents in preparation for future viral outbreaks.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...